Skip to content

Conversation

@JehandadKhan
Copy link
Collaborator

Motivation

In many instances it is useful to have symbols in the wheel to facilitate triage etc.

Technical Details

When the build.py script is triggered, we add the appropriate bazel flags to retain symbols in the different SO files in our wheels.

Test Plan

The produced artifacts are manually tests using the nm -D command to ensure that symbols are present.

Test Result

The output of the nm -D command shows that the symbols are indeed present in all the SOs in the wheel

Submission Checklist

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds debugging symbols to wheel builds in the CI to facilitate triage and debugging. The changes include:

  • Adding Bazel build options to retain symbols (--strip=never, -g3)
  • Updating Docker setup to use a more generic Python venv package

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tools/docker_dev_setup.sh Changes Python venv package from version-specific (python3.10-venv) to generic (python3-venv)
jax_rocm_plugin/build/rocm/tools/build_wheels.py Adds Bazel options to disable stripping and enable debug symbols in wheel builds
Comments suppressed due to low confidence (1)

jax_rocm_plugin/build/rocm/tools/build_wheels.py:180

  • When xla_path is set, a second --bazel_options argument is appended at line 180. Since argparse with action='append' creates a list, this will result in args.bazel_options containing two elements: the string '--strip=never --copt=-g3 --cxxopt=-g3' and the string '--override_repository=xla='. While the second option will work correctly, the first one will fail as explained in Comment 1. The fix should ensure all Bazel options are passed as individual entries.
        "--bazel_options=--strip=never --copt=-g3 --cxxopt=-g3",
        "--wheels=jax-rocm-plugin,jax-rocm-pjrt",
        "--rocm_path=%s" % rocm_path,
        "--rocm_version=%s" % version_string,
        "--use_clang=%s" % use_clang,
        "--verbose",
        "--output_path=%s" % output_dir,
    ]

    # Add clang path if clang is used.
    if use_clang:
        clang_path = find_clang_path()
        if clang_path:
            cmd.append("--clang_path=%s" % clang_path)
        else:
            raise RuntimeError("Clang binary not found in /usr/lib/llvm-*")

    if xla_path:
        cmd.append("--bazel_options=--override_repository=xla=%s" % xla_path)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Ruturaj4 Ruturaj4 self-requested a review November 14, 2025 17:05
"python",
"build/build.py",
"build",
"--bazel_options=--strip=never",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a full debug build with assertions enabled and minimal optimizations, there's also a debug config: .bazelrc:522

python build/build.py build ....
  --bazel_options=--config=debug

https://github.com/jax-ml/jax/blob/1d0d80c65ed4897e784ac898787cc0d7697ec134/.bazelrc#L521

and this is the option to turn it on or off @charleshofer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you meant to link to our .bazelrc? It has the same config https://github.com/ROCm/rocm-jax/blob/master/jax_rocm_plugin/.bazelrc#L121

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes, we should use that instead, and modify it with whatever flags we want

Copy link
Collaborator

@charleshofer charleshofer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to talk to Aleksei about how this will relate to the debug flags he put in the Makefile template. We it'd be nice if we could unify on one set of debug flags that can just sit in .bazelrc under their own config: https://github.com/ROCm/rocm-jax/blame/master/stack.py#L28

make \
patchelf \
python3.10-venv \
python3-venv \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

"python",
"build/build.py",
"build",
"--bazel_options=--strip=never",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that as many Bazel options as possible make their way down to the jax_rocm_plugin/.bazelrc file rather than getting threaded through one of our dozen-or-so wrapper scripts. Preferably, it'd be inside of something like a build:debug config, so you can enable it with ./bazel build ... --config=debug ... or build/buid.py ... --bazel_options="--config=debug" ....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is to enable symbols by default, so I can add the flags to our usual build, would that make sense to you ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense

"python",
"build/build.py",
"build",
"--bazel_options=--config=debug_symbols",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we stick this in the build/build.py instead rather than storing bazel arguments directly in here?

@JehandadKhan JehandadKhan marked this pull request as draft November 20, 2025 21:21
@JehandadKhan
Copy link
Collaborator Author

@Ruturaj4 and @charleshofer I believe I have addressed the required changes, can you please review. If the comments above are addressed please mark them as resolved

Copilot finished reviewing on behalf of JehandadKhan December 3, 2025 22:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)
rocm_group.add_argument(
"--preserve_debug_symbols",
type=bool,
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using type=bool for argparse arguments doesn't work as expected. In Python's argparse, bool() will return True for any non-empty string, including "False" or "false". This means passing --preserve_debug_symbols=false will still evaluate to True.

Consider using one of these patterns instead:

  1. Use action="store_true" with default=False if you want a simple flag that enables the feature when present
  2. Use type=utils._parse_string_as_bool (like the --use_clang argument on line 260) if you want to accept string values like "true"/"false"
  3. Use action=argparse.BooleanOptionalAction (Python 3.9+) which automatically creates --preserve_debug_symbols and --no-preserve_debug_symbols flags

Given that the default is True and this is meant to be configurable, option 2 would be most consistent with the existing codebase.

Suggested change
type=bool,
type=utils._parse_string_as_bool, # pylint: disable=protected-access

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot is right @JehandadKhan . You should use action="store_true" and set the default to False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants